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We present a simple, pedagogical introduction to the statistics of extreme values. Motivated 
by a string of record high temperatures in December 1998, we consider the distribution, averages 
and lifetimes for a simplified model of such "records." Our "data" are sequences of independent 
random numbers all of which are generated from the same probability distribution. A remarkable 
universality emerges: a number of results, including the lifetime histogram, are universal, that is, 
independent of the underlying distribution. 



I. INTRODUCTION AND MOTIVATION 

In December 1998, in the aftermath of El Nino and its companion, La Nina, the weather in the Roanoke, 
Virginia, area was unusually mild. Weather data have been collected here since 1934, and record highs and lows for 
any particular day are known. As part of the daily weather forecasts, local TV stations report these record values 
and compare them to the highest and lowest temperature values of the day. Remarkably, during the nine days from 
November 29 to December 7, 1998, the previous record highs were broken five times and tied oncel |l[] One might 
wonder, as we did, how frequently such a series of records could possibly occur. When only few weather data are 
available, such as in the early years of record keeping, it is obviously quite easy to experience new extremes. However, 
as the data sets become larger, the record highs (and lows) are pushed to higher (lower) values, so that the setting 
of a new record becomes a far less frequent event. Thus, we began exploring questions such as: How probable was it 
to set a new record in 1998, 64 years after the records began? How do record highs increase with time? How long do 
records typically survive before they are broken? 

Not surprisingly, similar questions have been posed before. It appears that the earliest studies are due to N. 
Bernoulli who analyzed life expectancies in 1709 Later, flood control and structural safety issues were considered, 
to name just a few of the numerous applications. Beginning in the 1920's, the mathematical techniques were developed 
and the study of records, or extremes, became known as "extreme value statistics," [|| which is still an active area 
of research. In this paper, we provide a pedagogical introduction to some of the most basic results. We begin in 
Section II with the simplest model for records and a concise statement of the problem. There is no attempt to address 
real records. As a result of complex physical processes, the statistics of real weather records is undoubtedly far more 
intricate than that generated by simple random numbers. The quotation marks in the title should remind the reader 
that actual weather records are not the subject of this article. Section III is devoted to the full distribution function 
for the model records, their averages, and standard deviations. Next, we discuss the record lifetimes and derive the 
associated distribution in Section IV. Although we focus on "record highs," a completely analogous line of reasoning 
can be pursued for "record lows." Q In the final section, we turn to more general questions and conclude by listing 
a number of problems which, to the best of our knowledge, are still unsolved. Some technical details are provided in 
the Appendix. 



II. A SIMPLE MODEL FOR RECORDS 

Our much simplified model for the physical data (temperatures, water levels, etc.) is based on a probability 
density p{x) for a real variable x. For example, if we are considering temperatures, p{x)dx might be the probability 
that the temperature at noon, on a specific day of the year, takes a value between x and x + dx. The "data" in our 
simplified model are just a sequence of random numbers: 

{xil),x{2),x{3),...,x{t),...}. (1) 

In keeping with the language of weather records, we refer to the (integer) label of these numbers as "time" (or "year"): 
t = 1, 2, 3, . . . Each of the x's is drawn from the same distribution p{x). Thus, our data form a set of independent, 
identically distributed random variables. This condition is probably the most serious shortcoming when applied to 
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physical reality, where major correlations or variations can be expected. For example, it precludes "global warming," 
a situation in which the underlying distribution p{x) itself is a function of time. 

Before discussing the sequence and the records, let us provide some details about the distribution p{x). It may 
be defined over the entire real axis or restricted to an interval, that is, it may have infinite or finite support. For 
simplicity, we require it to be continuous and positive, which excludes, for example, dice throwing. As a result, we 
can ignore the possibility of ties. Of course, it must be properly normalized, that is, Jp{x)dx — 1. Here and in 
the following, any integral limits that are not explicitly specified are to be taken as the appropriate (upper and/or 
lower) bounds of the support. Because the distribution p{x) reflects how we model our data set, we refer to it as 
the "model distribution," or simply "the model." In particular, we will investigate to what extent our results for 
record distributions and lifetimes depend on the underlying model. For later reference, we also introduce a second 
distribution, simply related to p{x): 

q{R) EE j p{x) dx (2) 



Clearly, q{R) is the probability that a random number, drawn from the distribution p{x), will not exceed R. Thus, 
is a monotonic function, varying from to 1. The reader familiar with random numbers on a computer will 
recognize that the inverse of this function, R{q), is just the operation to generate random numbers for an arbitrary 
density p{x) from the uniform distribution on the interval [0, 1]. Following Galambos we call q(R) the "common 
distribution function." 

Returning to the sequence (^, we define the "record" R{t) as the largest number in a string of length t: 

R{t) = max{— oo, a;(2), a;(3), . . . ,x{t)} 

= snp{x{l),x{2),x{3),...,x{t)}, (3) 



where we have included R{0) = —oo as the (arbitrary) initial value. After generating the next new number x{t + 1), 
we determine whether the record has been broken, according to 

a;(t+l)<i?(i) 
^^^+^>-\xit + l) if x{t + l)>R{t). 



This procedure is continued until we have obtained a sequence of length N. Of course, the records R{t) themselves 
are stochastic quantities. So, we can define a density for R: 

P{R,t) 



so that the probability for finding the record to lie between R and R + dR is just P{R, t)dR. Our goal is to determine, 
given the underlying model distribution p{x), this P{R, i) or, at a simpler level, the average record: 

(i?(<)) = j dRRP{R,t). (5) 



Before delving into the analytic aspects, let us consider computer simulations of these records. To study their 
statistics, we generate a large number, M, of sequences (modeling, for example, weather data from M cities): Ri{t),i = 
1, ...,M . Based on this ensemble of sequences, we can define the average record as a function of time 
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As an ilustration, following are two {M = 2) sequences of ten {N — 10) random numbers generated by RAND() in 
MS-EXCEL (quoted to 4 digits for simplicity): 

xi (t) = {0.6855, 0.3140, 0.4555, 0.8082, 0.0023, 0.9722, 0.9372, 0.1412, 0.7057, 0.9699} 



X2{t) = {0.1042, 0.8997, 0.7126, 0.4638, 0.2598, 0.9864, 0.2586, 0.0577, 0.7777, 0.8863} 
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and the associated sequences of records: 

Ri (t) = {0.6855, 0.6855, 0.6855, 0.8082, 0.8082, 0.9722, 0.9722, 0.9722, 0.9722, 0.9722} 

R^{t) = {0.1043, 0.8997, 0.8997, 0.8997, 0.8997, 0.9864, 0.9864, 0.9864, 0.9864, 0.9864} 

The average record is, therefore, 

{R{t)) = {0.3949, 0.7926, 0.7926, 0.8540, 0.8540, 0.9793, 0.9793, 0.9793, 0.9793, 0.9793} 

Since RAND() outputs a random number uniformly distributed between and 1, it is hardly surpirsing to see that 
the average record bumps up against 1 here. To get a good grasp of this process, the reader should attempt, say, 100 
sequences of = 300 on a spreadsheet. On a typical modern PC, it takes only "the blink of an eye", literally! 

For the computer simulations presented in this article, we used a high quality random generator (RAN2 from 
Numerical Recipies and averaged over M — 10^ sequences of length N = 10'^. This procedure requires just a few 
minutes running LINUX on a 450 MHz Pentium II, and produces excellent statistics. In fact, with 10^ sequences, it is 
possible to generate a reasonable picture of the whole probability distribution for the records. A theoretical approach 
to the averages will be our next task. 



III. THE RECORD PROBABILITY DISTRIBUTION 

A. The evolution equation and its solution 

Our goal in this section is to find an analytic form for the probability density P{R, t). We will do this recursively, 
by assuming that we know the value of the record, say R' , at time t — 1. Then, we seek the conditional probability, 
P{R, t\R' , t — 1), that the record, at time t, has the value R, provided it had the value R' at time t — 1. Fortunately, 
it is very easy to write down this quantity! Clearly, it vanishes for R < R' , because the new record cannot be smaller 
than the old one. This leaves us with two possibilities: Either the old record stays the same so that R = R', or it is 
broken, resulting in the new value R > R'. The former is the case if the new random number, x{t), does not exceed 
R' . Reviewing Eq. (j^), we see that this case occurs with probability q{R). In contrast, if x{t) exceeds R' , then it sets 
the new record R. This latter case occurs with probability p{R) ■ Summarizing, the conditional probability is 

f R<R' 
P{R,t\R',t-l) = \ q{R) for R = R' 
[ p{R) R> R' 

= q{R)d{R- R') +p{R)eiR- R') . (6) 

The Heavyside step function O i unity if its argument is positive and zero otherwise. Its derivative is just the Dirac 
delta function 6, so that the two terms in Eq. (0) can be combined into 

P{R, t\R', i - 1) = ^ [q{R)e{R - R')] • (7) 

Because we have exhausted all possibilities for R, the conditional probability must be normalized with respect to an 
integration over R. This condition is easily checked: because P is a total derivative, its integral is just qQ evaluated 
at the limits. So, we have / dRP{R, t\R',t- 1) ^ 1. 

From the conditional probability, it is a simple step to arrive at our main target: the record probability density 
(the "record" distribution) P{R,t): 

P{R, t)^J dR'P{R, t\R\t- l)P(i?', t - 1) . (8) 
Substituting the explicit form (H) for the conditional probability, we obtain a recursion relation for P: 
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dR' — [-7(i?)e(i? - R')] P{R\ t ~ 1) 



_d_ 
dR 
d_ 
'dR 



q{R) j dR' Q{R - R') P{R', t - 1) 
q{R)j dR' P{R',t-l). (9) 



Clearly, this form is still slightly unwieldy, due to the integration. Let us define the "barrier" distribution Q{R,t) to 
be the probability that at time t the record is at R or lower: 

QiR,t) = / dR' P{R',t), (10) 

so that 

PiR,t) = -^Q{R,t) . (11) 

From Eqs. - ^), the function Q satisfies a very simple recursion equation, namely, 

Q{R,t)^ q{R)Q{R,t^l). (12) 

Equation ( |l^ ) has a simple interpretation. Recall that q{R) is the probability that the next random number is less 
or equal to R, and Q{R,t) is the probability that, at time t, the record has not exceeded R. The product of the two 
gives the probability that the record remains unbroken after the next time step. 



The recursion relation (12) is easily solved: 

QiR, t) - [q{R)]' QiR, 0) = [qiR)f (13) 

because Q{R,0) — 1 for all R > — oo. We can deduce two important properties from this general solution for this 
barrier distribution Q. Because q{R) is a monotonic function, so is Q{R). This behavior of Q(R) may be interpreted 
as at any given time, it is more difficult to go over a higher barrier (break a record). On the other hand, because 
q < 1 for any fixed i? < 1, Q decreases with t which implies that any given record can be broken, provided we wait 
long enough! We emphasize that these "sensible" results are completely independent of the details of the underlying 
distribution p{x). 

From (p^), the record distribution follows: 

P{R, t) = -^QiR, t) = t [q{R)t' ^ = t m)] .*- V(i?) (14) 

Once P{R, t) is known, we can compute average values for the records, (i?(t)), through the defintion (^, as well as the 
standard deviations and all higher moments. For later reference, we provide a simplified representation of the integral 
in Eq.(||). Substituting (Q into (||), we have {R{t)) = JdRRt [q{R)]^~^ p{R). Next, recall that q{R) is monotonic, so 
that it can be inverted uniquely to give the function R{q). Now, change the integration variable from R to q, using 
p{R)dR = dq. The result is 

{R{t))^tf dqR{q)q'~^ . (15) 

Note that the limits of integration are now unique^ that is, independent of the underlying model! 

In the next section, we will illustrate the characteristics of P(-R, t) and {R{t)) with two simple, explicitly solvable 
distributions p. 



B. Two exactly solvable examples: flat and purely exponential distributions 



Let us illustrate our results by considering two particularly simple cases: a flat distribution 



4 



p{x) = 1 for < X < 1 (16) 

and a pure exponential 

p(x) = e-"" for < X . (17) 

These distributions differ significantly in that the former has an upper bound for the allowed values of the data. As 
a result, the possible record values are also bounded! In contrast, the latter distribution extends to infinity, setting 
no limits on the possible records. Studying these two simple distributions will lead us to find rather different, but 
hopefully generic behavior for bounded versus unbounded models. 

The flat distribution ( p^ corresponds to data whose values are equally probable over a given interval. From Eq. 
(H), we find q{R) = R (for < i? < 1) and 

Q(i?,i) = [g(i?)]* = i?* (18) 

so that 

P(i?, t) = dQ/dR = tR*-^ . (19) 

Both results are easily interpreted. The barrier distribution Q{R, t) displays explicitly the general properties discussed 
above: increasing with R at fixed t and decreasing with t at fixed R < 1. In contrast, the record distribution P{R,t) 
displays a maximum in t for a fixed R. For early times, the probability to find the record at R is low because this 
value has not yet been reached, whereas, for late times, it has already been exceeded! 

We can also study P{R, t) as a function of R for a fixed t. The maximum value always occurs at i? = 1, increasing 
linearly with time. If we apply the normalization condition JPdR = 1, the width of the peak must narrow with time. 
In other words, the late time records are "piled up" just below the highest allowed value (unity in this case). Let us 
investigate how this feature is refiected in the average record. Using Eq. (||), we find easily 

t 

{R{t)) = t J dRRR'-^ = . (20) 

As expected, {R(t)) increases monotonically as a function of time, reaching its upper bound at t — oo: 

hm {R{t)) = 1. (21) 

t — *oo 

Of course, its rate of increase must vanish in this limit. In this case, the asymptotic is t^^ . This behavior is illustrated 
in Fig. la, which shows that the exact result, Eq. (^0|) is in excellent agreement with Monte Carlo data averaged over 
10^ sequences with 1000 entries each. 

Several of these properties are generic in the following sense. If the underlying distribution p has an upper bound, 
that is, p{x) = ioT X > B, then limt^oo {R{t)) = B. Not surprisingly, the behavior of p near B will dictate the 
asymptotics. For example, if we assume p{x) iik{B — x)^~ , so that q{R) 1 — k{B — RY^ for x, then R ^ B, 
and we can show that B - {R{t)) ^ F (1/^) [kty^'^" , which is a generalization of the flat case. 

We next turn to the purely exponential distribution ( [T^ ) and its associated q{R) = l—p{R). The two characteristic 
distribution functions are 

Q(i?,i)= [l-e-«]* (22) 

and 

P{R, t) = t e-^[l - e-^Y'^ . (23) 

Many properties are qualitatively similar to the flat case. One difference is that the position, Ro{t), of the maximum 
in P{R, t) increases with time indefinitely: 

Ro{t) = Int . 

Once again, this behavior is reflected in the average record. Using Eq. ( p^ ) and deferring details to Appendix A, we 
obtain 
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(24) 



k=l 



which is obviously a monotonicaUy increasing function of time. To exhibit the asymptotic behavior for large times, 
we can write Eq. (24) in more compact notation: 



- + 1) - V (1) , 
where ^ is the digamma function with known asymptotic form. Q As a result, 

~lni + 7 + 0(l/t) fort^oo. 



(25) 



(26) 



where 7 ~ 0.5772. Note that the rate of increase also vanishes with time, but with a slower decay, i ^. In Fig. lb, 
we show the comparison of the asymptotic result {Uw with Monte Carlo data. The agreement is clearly excellent. 
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FIG. 1. Average record {R{t)) versus time t, for (a) flat and (b) exponential distributions. The diamonds are Monte Carlo 
simulation data; the solid line are the theoretical results, Eq. (|2^) and Eq. (^6]), respectively. 

As before, we can consider the behavior of the average record for more general unbounded distributions p{x). Here, 
the argument rests on a saddle-point approximation for P{R,t) around its maximum value, Ro{t). Asymptotically, 
the average can be approximated by the position of the maximum. Because Ro{t) becomes very large for late times, 
the asymptotics should be controlled by the behavior of p{x) for large x. Some examples of possible asymptotic 
behaviors of unbounded p's and their associated saddle points Rq {t) are 

1. a power law p ^ (with a > 2 to ensure that (R) exists), resulting in Ro{t) ^ t^/^"^^^; 

2. an exponential p ^ exp(— /ix), giving Ro{t) ~ hit; 

3. a Gaussian p ~ exp(— /ii?^)for which i?o(^) ^ Vhit. 
We invite the reader to carry out simulations for, say, 

p{x) = 2/x^, X e [1, 00] 

and check the results against the predictions in item 1. Wc would also like to caution the reader that the approach 
to asymptopia for the Gaussian is extremely slow. In particular, we find that this regime lies beyond t = 10^. 



IV. THE DISTRIBUTION OF RECORD LIFETIMES 



Next, we turn to a key question in the study of floods or earthquakes. What is the typical time span between two 
large events? At the practical level, how much time do we have to construct dams or to repair levees before the next 
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record-breaking flood? Of course, our study will not provide an exact time span until the next disaster; it will only 
give an estimate of how long a record might survive. We refer to the time span between the setting of a record and its 
subsequent breaking as its "lifetime" (or "record time" In the following, we investigate the distribution of these 
record lifetimes. Specifically, we will consider data sequences with TV entries. For each sequence, we will identify the 
associated records and their lifetimes. By analyzing a large number of data sequences, we can compile a histogram, 
T/v(m), of how likely a record would survive for a time span m. 



A. A tree of lifetimes 



Let us introduce a simple representation of all possible histories of records: a tree-like structure. We begin by 
generating a data sequence {a;(l), x(2), a;(3), . . . , x{N)}. Because we only need to know where the records occur in this 
sequence, we can associate this sequence with the following binary string. If x{i) is a new record, replace it by the letter 
R ("record"); otherwise, replace it by L ("lower"). Note that, had we used a discrete underlying distribution p, then 
we would have to consider the complication of ties. As an example, the sequence {0.2, 0.4, 0.3, 0.1, 0.6, 0.7, 0.2, 0.4, 0.8} 
is replaced by {R, R, L, L, R, R, L, L, R}. By convention, the first entry is always R. If a record is established at time t 
and broken at time t+m, then that record's lifetime is defined to be m. Clearly, the binary string is much simpler than 
the original data sequence, but it contains enough information about record lifetimes for us to predict the distribution 
TAr(m). 

The binary strings are easily visualized via a tree structure. Starting from a single vertex (the "ancestor" ) i? on 
the first line {t = 1), time runs downwards. At time t = 2 (the second line), our string has two possible entries: R 
or L. To represent these, we draw two branches from the first line to the second: one to the right (labeled R), and 
one towards the left (labeled L) . Each of these branches ends in a new vertex. These branch again, giving us four 
vertices in total on the third {t = 3) line. Continuing this procedure to the A^th level, we find 2^^^ vertices there. 
As an illustration, the A^ = 4 tree is shown in Fig. 2a. Clearly, all possible binary strings with 4 elements appear in 
this tree, each associated with exactly one vertex on the t — 4 line. 

The record lifetimes associated with a given string are now easily identified: following an L branch means that 
the current record survives, while choosing the R branch implies that a new record is set. Thus, each vertex can be 
labeled with the set of record lifetimes {ti, T2, . . . , r^} leading to it. Fig. 2b shows the "tree of lifetimes" for the A = 4 
case shown in Fig. 2a. Note that the number of entries in these sets varies for different strings. For example, the far 
right string in Fig. 2a, where every record is broken at the next time step, gives rise to {1, 1, 1, 1}, while the far left 
string corresponds to {4}: the record is set sd t — I and survives the next three time steps. A simple recursion relation 
emerges. From a particular entry at time t to the two "daughters" at time t+l, the set {ti , T2 , . . . , Tk-i, r^} branches 
into two: {ri, T2, ■ ■ ■ , rfe_i, -|- 1} and {ti, T2, . . . , r^, 1}. Moreover, because any distribution p{x) will generate the 
same set of binary strings, it is obvious that the associated lifetimes {n, T2, . . . , Tk} are completely independent of the 
underlying model! 



B. The string probabilities 



To complete the construction of the histogram, we need to find the string probability, that is, the probability that 
a specific string will appear. At the Ath level, each string is associated with a specific vertex, which is uniquely labeled 
by the set {ti, T2, . . . , Tk-i,Tk}. So, let us denote this probability by Pn{ti,T2, . . . , r^). At the top level {t — 1), this 
probability is obviously trivial: -Pi(l) = 1- For example, the year record keeping starts, any temperature will be a 
"record" In the second "year" (t = 2), there are two vertices: {ri = 2} and {ri = 1, T2 = 1}, corresponding to the first 
record surviving or being broken, respectively. In terms of the original data sequence {x{l), x{2)}, these possibilities 
are given by a;(l) > x{2) or x(l) < x{2). Let us focus first on the former case for which the probability, -P2(2), is 
just / dxi J dx2 p{xi)p{x2)Q {xi — X2)- At first sight, this expression seems to depend on the model distribution p{x). 
However, recalling Eq. (p^), we can transform the integration variables to 

P2{2) = / dqj dq2 e (qi - 92) - / dqi dq2 = ^ , (27) 







where we have exploited the monotonicity of q{x) to replace & {xi — X2) by Q {qi — 92)- Not only is this integral 
trivial to compute, it is also manifestly independent of the underlying model! In a similar manner, we can compute 
exphcitly that the probability of the latter case, ^2(1, 1), is also 1/2. 
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FIG. 2. Tree of records. At the A''th level of the tree are the 2^ possible histories, (a) The binary strings representing the 
histories of records, (b) The lifetime associated with each string, (c) Probabilities associated with each history, or "string 
probabilities." 

Let us illustrate this process for the next year [t = 3). For the four possible histories, different combinations of 
9-functions appear. Explicitly, their associated probabilities are: 



i^3(3) 


= / dqi dq2 / dq^Q {qi 
Jo Jo Jo 


- 92) B {qi 


- 93) 


^^3(2,1) 


= dqi dq2 / dq:0 {qi 
Jo Jo Jo 


- Q2) 6 {qs 


-qi) 


^3(1,2) 


= dqi dq2 / (ig30 {q2 
Jo Jo Jo 


- qi) (32 


- 33) 


^^3(1, 1,1) 


= dqi dq2 / dq:0 {q2 
Jo Jo Jo 


- Qi) {Q3 


- 92) 



(28) 
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In Fig. 2c, we show all the probabilities in the TV = 4 tree. Note that the sum of all the Pjv's at each level is indeed 
unity; the probability of having any history after N years must be 1. 

In Appendix B, we show that the general result for an arbitrary string of any length N is 



Pn{ti,T2, ...,Tk) 



1 



Tl in + T2) (ti +T2+ T3) 



(29) 



Note that the last factor is actually just N. 

Another somewhat counter- intuitive result concerns S jv (m) , the probability for the last record to survive m steps 
(regardless of what happened earlier). Note that the "last record" is also the "highest record," since the last record 
is necessarily larger than all previous records! Now, to say that this record survives m steps means that the values in 
the rest of the string (m — 1 of them) are lower. Therefore, in a string of N steps, the last record must have occurred 
at the £= {N — m + l)-th step. Thus, to find S'jv(Tn), we ask: what is the probability for the highest record to show 
up at the l-th step? Wc might guess that the highest record is unlikely to occur near the beginning of the string, 
because there are many chances for it to be broken later. On the other hand, if we use the language of athletics, we 
might conclude that the record facing the last athlete may be quite high (given that "many guys have gone before" ) 
and breaking the record may not be easy. So, perhaps S'jv(to) should be peaked in the middle? The surprise is that 
S'jv(to) is independent of m (or equivalently, t)\ In other words, the highestl record may occur at any step with equal 
probabilityl The proof may be found in Appendix C. 



C. The universal lifetime distribution 

Lastly, we turn to the (unnormalized) lifetime distribution, T/v(rn). We again Uustrate by explicit calculations 
of the N = 4 case and relegate the general discussion to Appendix D. Combining the string probabilities with the 
lifetimes, we obtain: 

U4) = P4(4) = i 

T4(3)=P4(3,1) + P4(1,3) = 1 + 1 = 1 

T4(2) = 2P4(2, 2) + P4(2, 1, 1) + P4(l, 2, 1) + P4(l, 1, 2) = ^ (30) 
r4(l) = P4(3, 1) + 2P4(2, 1, 1) + P4(l, 3) + 2P4(1, 2, 1) + 2P4(1, 1, 2) + 4P4(1, 1, 1, 1) = 1. 

Focusing on a particular binary string, we recall that the associated record lifetimes {ti,T2, ■ ■ ■ ,Tk} as well as the string 
probability are entirely model- independent. As a result, the lifetime distribution itself is universal, that is, its form 
does not depend on the underlying model distribution p{x). Moreover, the result for T4{m) suggests a surprisingly 
simple form for the lifetime histogram, namely, 

Tiv(m) = 1/to . (31) 



The general proof can be found in Appendix D. Remarkably, Tjv is not only universal, but also is independent of N, 
that is, the length of the original data sets! So, the probability for a record to survive, for example, 5 time steps is 
the same, no matter how much data we accumulate. Phrased differently, this result is not surprising at all: because 
Tjv(m) is also, roughly speaking, the probability for the next huge disaster to strike after m time steps, it would be 
very disturbing if that probability depended on the length of our data sets: that would imply that we could avoid 
or court disaster by just continuing to take data. Clearly, such a result would be nonsensical. Nobody would believe 
that, by observing the weather, we can change it! 

Although the lack of an iV-dependence can be understood in this way, the universality with respect to the 
underlying model p{x) is a much stronger statement: As long as all underlying data arise from the same distribution, 
the functional form of this distribution is completely irrelevant. Flat, exponential or Gaussian p's all generate identical 
lifetime histograms. In Fig. 3, we show Monte Carlo data for the lifetime histograms of flat and exponential p{x). They 
arc indistinguishable, apart from statistical fluctuations! The agreement with the theoretically expected Tj^{m) = 1/m 
is again excellent. 
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V. CONCLUDING REMARKS 



We have presented a pedagogical introduction to the statistics of extremes. Though we motivated the discussion 
by reference to weather records, the model studied here is much simplified. Generating a sequence of N random 
numbers, all selected from the same underlying distribution p{x), we keep track of the ones which are, say, higher 
than all the preceding numbers - "record highs." Studying the statistics of such strings, we identify several universal 
features, that is, properties which are independent of the underlying distribution p. In particular, we ask how long a 
record survives before being broken, that is, we focus on the lifetimes of records and compile a histogram. Not only is 
that histogram universal, but it is also exceedingly simple: records lasting m steps occur with a decreasing frequency 
of 1/m. We also inquire into some details of these strings. As far as records are concerned, the sequence of numbers 
can be reduced to a binary string of i?'s and L's. At each step, we only keep track of whether the record is broken 
(i?) or not (L). A "tree" representing all possible sequences can be drawn which displays whether a record is broken 
or not at each step. It is natural to ask. What is the probability that records will be broken at specific steps along 



the string? The answer, given by Eq. ( |36| ) or (29), is also universal. Perhaps most surprising is the answer to the 
question. What is the probability that the overall record will occur at the mth step? The answer is independent of 
m. The overall record occurs at any step with equal probability. 

Another interesting question is the following. Averaged over many sequences, how does the overall record "inch 
up" with time? Although the answer is not completely universal, the average behavior falls into one of several classes. 
The simplest is due to a p{x) which is bounded. Then, not surprisingly, the average just runs into this bound. For 
unbounded p's, the asymptotic behavior is mostly dictated by the "tail" of the distribution. In this sense, there is a 
limited form of universality, that is, properties are independent of the details of the rest of Beyond the average 
record, we also studied the probability density, P{R, t), for finding the record at R after t steps. Like the central limit 
theorem, there is universality for large values of t, provided an appropriate time-dependent rescaling of i? — (R) is 
included. However, there are three limiting distributions, as well as the possibility of no limiting distribution. These 
fascinating properties lie outside the scope of this article. We refer the interested reader to the book by Galambos for 
example. 

Looking beyond our simple model for "weather" records, we may inquire about a natural generalization - a 
model for "global warming." Here, we let the underlying distribution drift upwards with time. The simplest is a 
uniform drift, that is, letting the model distribution at time step t, pt{x), assume the form p{x — at) with a > 0. 
Although numerous results still hold, there are also significant differences. Obviously, the average record here must 
increase linearly. Probably the most significant difference between the simple model for "weather" records and this 
"global warming" case is that the limiting distribution for P(i?, t) is expected to assume the form P* {R — at), with no 
time-dependent rescaling. We are not aware of any general results for such drifting p's. Certainly, these expectations 
are confirmed for a flat, drifting pt{x). For large times, we find P(R,t) = diiQ{R,t), with 

Q{R,t)^Q*{R~at), (32) 

where 
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^*(^) = "'F^r^TT% forl-fca<^<l-(fc-l)a. (33) 
r (^/a + 1 - fc) 

To arrive at these results, we relied on a generalized version of (p^: Q{R, t) — nl=i IniR), where = J^Pn{x)dx. 

Clearly, these general forms are applicable for any time-dependent p. For example, we could consider a sequence 
composed of the sum of all random numbers generated before. If the random numbers are distributed such that both 
positive and negative values are present, we may think of the sequence as the value of a stock, making gains and 
losses from day to day. The variations are limited only by the imagination. Unlike the simple model presented above, 
there are very few known results for the distributions of the record lifetimes in general. Needless to say, exploring the 
universality classes will be a task both daunting and rewarding. 
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APPENDICES 



A. The integral for (24) 



To obtain the desired result (|2^), we start from ([l5| ) and the inverse of 



t [ dq[- ln( 
Jo 



-Hl-q)]q'-' =t 



oo oo 

V I ^ V 

n(n + t ) ^ 

* 1 



1 1 



n n + t 



(34) 



fc=i 



k 



Those who like a bit more mathematical rigor may consider lim^^i dq..., which justifies the exchange of the 
integral and sum. 



B. General string probabilities 



We provide some details for computing the general case Pn{ti,T2, . . . ,Tk). First, let us introduce an alternate 
way of labelling the vertices. Instead of the record lifetimes {ri, T2, . . . , t/j}, we keep track of the times {ri, r2, . . . , rfc} 
when records are broken: that is, denotes the position of the ith R along the string. So, we may also denote the 
general string probability by P^v {t'IiT'2t ■ - Tk)- By convention ri = 1, while ?"2 = ri + ti, rj, = r2 + T2, etc: 

Ti = r2 - ri = r2 - 1 
T2 = 7-3 - r2 

Tfe-i = Tk - rfe_i 
Tk = N + l-rk, 
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where the last hne comes from the length of the string: N = rk + Tk ~ 1. 
To obtain P for an arbitrary string, we start with 



Pn iri,r2, ■■■rk) = J dqi . . . dqN x 9 . . . 6 . (35) 

To say that the first record Ri associated with qi has a hfetime of ri means that the next (ti — 1) i?'s (and their asso- 
ciated q's) are lower. Therefore, the first (ri — 1) factors in the integrand are 8 {qi — (72) © {qi — 93) • • ■ © (51 — 9r2-i) 
so that the integration over the variables (72, 93, • • ■ , '?r2-i can be performed triviaUy, with the result qi^ '^. 

Now, the next record is associated with q^^ , so that the next factor in the integrand must be Q [q^^ — qi)- 
There will be no more appearances of qi in the rest of the integrand. So, the integral over qi (up to (7^2) can be 
performed, giving q^l^^ / ('"2 ~ !)■ Note that ti = ^2 — ri = r2 — 1 so that this result can be written simply as <7^^~^/ti. 

In a similar way, we integrate over the — r2 — 1 variables gr-a+ii ■ • • 1 9r3-i up to 5^2: arriving at q^^~'^ /ti or 
qrl~'^ I {i'2 — 1). Performing the integral over qr^ gives 9^3"^/ (^2 — 1) ('"s — !)■ This process can be carried on until 
the last integration (over q^.^^ up to 1). The integrand here consists of the result from the integrals over the previous 
variables qrl~^ I (^2 — 1) (^3 — 1) • • ■ {fk — 1) as well as the g's from the rest of the sequence: grt+i, • ■ ■ , qN- With this 
additional factor of q^^^^'' , the last integral provides a factor of N. The final result is 

PN{ri,r2,...rk) ^ — -\ — — . (36) 

(r2 - 1) (ra - 1) . . . (rfe - 1) iV 

Rewriting the r's in terms of the t's, we have 

ri(ri +T2)... [Linj 

where the last factor is precisely N, because the sum of the lifetimes of all records is just the length of the string. 

One interesting property of these P's follows from the fact that, given a particular string of length N, we can 
"generate" two strings of length + 1, by concatenating either an i? or an L at the end. In the former case, stays 
the same while Tk+i = 1. In the latter case, the value of Tk increases by 1. From Eq. ([29|), we obtain the following 
"recursion" relations: 

N 

Pn+1 {ti,T2, ...,Tk + l) = _^ P/v (ti,T2, . . . ,rfc) . (38) 

^V + l (ti,T2, . . . ,Tfe,Tfe+l = 1) = _^ PjV (ti,T2, . . . ,Tfc) . (39) 



Not surprisingly. 



^'jv+i (n, t-2, . . . , Tfc, 1) + Pn+1 {ti,T2, . . . , Tfc + 1) = Pat (n, r2, . . . , rfe) . (40) 



To illustrate, consider the first pair of entries at the 4*^* level in Fig. 2 and their relationship to the first entry 
at the 3'"'' level {N = 3). From Fig 2c, we read off Pg (3) = 1/3, P4 (4) = 1/4, and P4 (3, 1) = 1/12. These quantities 
satisfy Eqs. iMM). 



C. Probability for strings where the last record survived m steps 



An easy way to arrive at the lifetime histogram rjv(m) = 1/m is through the probability for the last record to 
survive m steps, regardless of what happened earlier. Let us define this quantity as 5jv(™)- To be precise, it is 

SN{fn)=Y,Hrk-rn)PN{{r}), (41) 

{r} 

where S is the Kronecker delta (d is unity if its argument vanishes and zero otherwise). Be careful with this sum 
notation: it stands for summing over k as well, because k is the number of records in the string. Now, because the 
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"last" record is also the overall record, (that is, all other i?'s are lower), it is easy to compute S'Ar(m). Because we are 
summing over all possible ways that the records before this step {£ = N — m + I) are broken, we may write 



5'jv(m) = I dqi. .. dq^ x Q {qg ~ qi) . . .G [qi ~ 9 {qt - qi+i) ...Q{qi- qn) 

dqi iqef^' = l/N. (42) 

As remarked in the main text, this result is not intuitively obvious at all. Intuition might lead us to an 5'Af(m) 
with a maximum in the middle of the string, but Eq. ( ^2|) tells us that the distribution is entirely flat. 

D. Histogram for record lifetimes Tjv(m) 

Note that TNim) is not a real probability in the sense that T > 1. The normalization condition here is 

^P^({r}) = l, (43) 

{r} 

but the histogram is defined by 

k 

T^{m) = Y,PN{{r})J2S{n-m) . (44) 

{r} ^=l 

Again, we caution the reader that the J2{t} ^^^^ involves a sum over k as well. Summing over m produces J2{r} > 

Now, given a string of N random numbers, the only way for a record lifetime to be N is that the first record 
survives. Thus, Tn{N) is precisely Pn{ti — N), so that 

TNiN) = l/N. (45) 

Our goal reduces to proving 

TAr+i(TO) = TAr(m) for 1 < m < TV . (46) 
There are two types of contributions to r/v(r7i). One is from all the "interior" records: 

k-l 

f^(m)=^P^({T})^<5(r,-m) (47) 

{r} i=l 

that is, Ti = m with i < k. The other piece, from the last record alone, is precisely Sn{iti). For T/v+i, let us start 
with T/v (m) and go from to TV + 1 , by looking at 

k-l 

J2PN+i{{r})J2S{n-m), (48) 

{r} i=l 

where the k's appearing here are still those associated with TN{m) (that is, the last record may be labeled by fc + 1). 
Because the sum over and Tk+i can be performed without the (5's, Eqs. (|38| ) and (|39| ) can be used to obtain the 
following recursion relation 

k-l k-l 

E ({^}) E ^ (^^ - = T.p^ ({^}) E - ^) • (49) 

{r} i=l {r} i=l 
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But the quantity (^8|) is not the only contribution to Tjv+i, because concatenating an R to an A'^-string will produce 
an "interior" record for the {N + l)-string. Focusing on a specific m, this extra bit is just 

{ti,T2, . . . ,Tfe,Tfc+i = l)(5(Tfc - m) , (50) 

{r} 



which is 



{r} 

So, we conclude 



^ j;j^Pn iTi,T2, . . . ,Tk) 6 (Tk - m) = j^^SN{m). (51) 



iV + 1 

With the help of this equation, we have 

TN+iim) = Tjv+i(to) + SN+iim) 



fAr+i(m) = T'Ar(m) + %^. (52) 



~ S'jv(m) 
TN(m) + + SN+i(m). 



But, according to Eq. (E2h, 



TV 

SN+i{m) = j^^i ^N{m) (53) 

so that 

TN+iim) ^ TN{m) + SNim) ^ TNim) . (54) 

Thus, using (p5|), we arrive at 

TN{m) = 1/m (55) 

for any N > m. 



[1] The data can be found at: littp://www. wdbj7.com/climate/climate. htm 



[2] See for example, E. J. Gumbel, The Statistics of Extremes (Columbia University Press, New York 1958); J. Galambos, The 
Asymptotic Theory of Extreme Order Statistics (Wiley, New York 1978). Gumbel gives a short historical summary in his 
book. 

[3] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipies, Second Edition (Cambridge 
University Press 1992). 

[4] M. Abramowitz and I. A. Stegun, Handbooii of Mathematical Functions (Dover Publications, New York 1974). 
[5] Record times are discussed, for example, by Galambos (see Ref. 2). 
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